Your request cart is empty!
Dataset Description
176:53:28 hours of 113 GB | 456 Speakers | 77443 Audio segments | 48 kHz | 16 bit wav
Bodo, one of the scheduled language of India, is one of the Tonal
languages of the world. There are two clearly distinguishable kinds of tones in
Bodo which are known as Low and High. The language
belongs to the Tibeto Burmese linguistic family. It is the language of Bodos,
which are the major tribes of the Indian State of Assam.
Bodo, one of the scheduled language of India, is one of the Tonal
languages of the world. There are two clearly distinguishable kinds of tones in
Bodo which are known as Low and High. The language
belongs to the Tibeto Burmese linguistic family. It is the language of Bodos,
which are the major tribes of the Indian State of Assam.
The LDC-IL Bodo speech data is collected from the regions of
Chirang, Baksa Sonitpur Udalguri, Kamrup, Barpeta, Udalguri, Kokrajhar
districts of Assam State of India which covers Bwrdwnari, Eastern, and Standard
dialects. The data is collected from both the genders and different age groups.
The available Speech Corpus details:
Total Speakers 456 (220 Female and 236 Male)
Domains |
Audio Segments |
Each Domain Duration |
Contemporary
Text (News) |
411 |
53:47:56 |
Creative
Text |
413 |
26:47:07 |
Sentence |
10,257 |
09:16:54 |
Date
Format |
938 |
01:58:08 |
Command
and Control Words |
12,348 |
14:19:32 |
Person
Name |
8,222 |
14:49:44 |
Place
Name |
4,115 |
05:17:14 |
Most
Frequent Word - Part |
12,397 |
14:34:05 |
Most
Frequent Word - Full Set |
6,994 |
04:30:14 |
Phonetically
Balanced |
15,999 |
20:07:33 |
Form
and Function - Word |
6,383 |
08:28:25 |
A
detailed explanation of the Bodo Speech Corpus will be available in the Bodo
Speech Data Documentation.
For any research-based citations, please use the following citations:
- Ramamoorthy, L., Narayan Choudhary, Bridul Basumatary & Farson Daimary. 2019. Bodo Raw Speech Corpus. Central Institute of Indian Languages, Mysore.
- Choudhary, Narayan, Rajesha N., Manasa G. & L. Ramamoorthy. 2019. “LDC-IL Raw Speech Corpora: An Overview” in Linguistic Resources for AI/NLP in Indian Languages. Central Institute of Indian Languages, Mysore. pp. 160-174.
Item specifics
- Authors Ramamoorthy L., Narayan Choudhary, Bridul Basumatary, Farson Daimary
- Corpus Type Raw Corpus
- Catalogue Number 1112
- ISBN 978-81-7343-211-8
- Data Source On Field
- Duration 176:53:28
- # of Audio Segments 77443
- Release Date 04-Apr-2019
- Terms and Conditions General instructions for use of the resources provided by LDC-IL.